The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
了解用户的意图并从句子中识别出语义实体,即自然语言理解(NLU),是许多自然语言处理任务的上游任务。主要挑战之一是收集足够数量的注释数据来培训模型。现有有关文本增强的研究并没有充分考虑实体,因此对于NLU任务的表现不佳。为了解决这个问题,我们提出了一种新型的NLP数据增强技术,实体意识数据增强(EADA),该技术应用了树结构,实体意识到语法树(EAST),以表示句子与对实体的注意相结合。我们的EADA技术会自动从少量注释的数据中构造东方,然后生成大量的培训实例,以进行意图检测和插槽填充。四个数据集的实验结果表明,该技术在准确性和泛化能力方面显着优于现有数据增强方法。
translated by 谷歌翻译
图神经网络(GNN)广泛用于图表学习。尽管普遍存在,但GNN在图形分类任务中遭受了两个缺点,忽视了图级关系和概括问题。每个图在GNN消息传递/图池中分别处理,并在每个单独的图表上操作过度拟合的现有方法。这使得图表在下游分类中学到的有效性降低了。在本文中,我们为图形分类任务提出了一个班级感知表示的改进(CARE)框架。 CARE计算简单但功能强大的类表示,并注入它们,以将图表的学习转向更好的类别可分离性。 Care是一个高度灵活的插件框架,能够在不显着增加计算成本的情况下合并任意GNN骨架。从理论上讲,我们还证明,通过VAPNIK-CHERVONENKIS(VC)维度分析,CARE具有比其GNN主链更好的概括上限。我们在9个基准数据集上使用10个著名的GNN骨架进行的广泛实验验证了护理的优势和有效性,而不是其GNN对应物。
translated by 谷歌翻译
The image recapture attack is an effective image manipulation method to erase certain forensic traces, and when targeting on personal document images, it poses a great threat to the security of e-commerce and other web applications. Considering the current learning-based methods suffer from serious overfitting problem, in this paper, we propose a novel two-branch deep neural network by mining better generalized recapture artifacts with a designed frequency filter bank and multi-scale cross-attention fusion module. In the extensive experiment, we show that our method can achieve better generalization capability compared with state-of-the-art techniques on different scenarios.
translated by 谷歌翻译
Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end-to-end neural network models. To address this issue, we present a highly efficient approach to perform real-time incremental TTS on GPUs with Instant Request Pooling and Module-wise Dynamic Batching. Experimental results demonstrate that the proposed method is capable of producing high-quality speech with a first-chunk latency lower than 80ms under 100 QPS on a single NVIDIA A10 GPU and significantly outperforms the non-incremental twin in both concurrency and latency. Our work reveals the effectiveness of high-performance incremental TTS on GPUs.
translated by 谷歌翻译
Large-scale pre-trained language models (PLMs) bring new opportunities to challenge problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.8% increase over best baselines.
translated by 谷歌翻译
如今,基础模型已成为人工智能中的基本基础设施之一,铺平了通往通用情报的方式。但是,现实提出了两个紧急挑战:现有的基础模型由英语社区主导;用户通常会获得有限的资源,因此不能总是使用基础模型。为了支持中文社区的发展,我们介绍了一个名为Fengshenbang的开源项目,该项目由认知计算与自然语言研究中心(CCNL)领导。我们的项目具有全面的功能,包括大型预培训模型,用户友好的API,基准,数据集等。我们将所有这些都包装在三个子项目中:风水次模型,风水框架和狂热基准。 Fengshenbang的开源路线图旨在重新评估中国预培训的大型大型模型的开源社区,促使整个中国大型模型社区的发展。我们还希望构建一个以用户为中心的开源生态系统,以允许个人访问所需的模型以匹配其计算资源。此外,我们邀请公司,大学和研究机构与我们合作建立大型开源模型的生态系统。我们希望这个项目将成为中国认知情报的基础。
translated by 谷歌翻译
命名实体识别是定位和分类文本中的实体的任务。但是,NER数据集中未标记的实体问题严重阻碍了NER性能的改善。本文建议SCL-RAI解决这个问题。首先,我们通过基于跨度的对比学习来减少相同标签的跨度表示的距离,同时为不同的标签增加了跨度表示,从而减轻了实体之间的歧义并提高了模型对未标记的实体的稳健性。然后,我们提出检索增强推理,以减轻决策边界转移问题。我们的方法在两个现实世界数据集上大大优于先前的SOTA方法的F1分数4.21%和8.64%。
translated by 谷歌翻译
LIDAR点云通常通过连续旋转LIDAR传感器扫描,捕获周围环境的精确几何形状,并且对于许多自主检测和导航任务至关重要。尽管已经开发了许多3D深度体系结构,但是在分析和理解点云数据中,有效收集和大量点云的注释仍然是一个主要挑战。本文介绍了Polarmix,这是一种简单且通用的点云增强技术,但可以在不同的感知任务和场景中有效地减轻数据约束。 Polarmix通过两种跨扫描扩展策略来富含点云分布,并保留点云保真度,这些杂志沿扫描方向切割,编辑和混合点云。第一个是场景级交换,它交换了两个LiDAR扫描的点云扇区,这些扫描沿方位角轴切割。第二个是实例级旋转和粘贴,它是从一个激光雷达扫描中进行的点点实例,用多个角度旋转它们(以创建多个副本),然后将旋转点实例粘贴到其他扫描中。广泛的实验表明,Polarmix在不同的感知任务和场景中始终如一地达到卓越的性能。此外,它可以用作各种3D深度体系结构的插件,并且对于无监督的域适应性也很好。
translated by 谷歌翻译
最近提出的检测变压器(DETR)已建立了一个完全端到端的范式以进行对象检测。但是,DETR遭受慢训练的融合,这阻碍了其对各种检测任务的适用性。我们观察到,由于对象查询和编码图像特征之间的语义不一致,DETR的缓慢收敛在很大程度上归因于将对象查询与相关区域匹配的困难。通过此观察,我们设计了与DETR ++(SAM-DETR ++)设计的语义对齐匹配,以加速DETR的收敛并改善检测性能。 SAM-DETR ++的核心是一个插件模块,该模块将对象查询和编码图像功能投射到相同的功能嵌入空间中,在该空间中,每个对象查询都可以轻松地与具有相似语义的相关区域匹配。此外,SAM-DETR ++搜索了多个代表性关键点,并利用其功能以具有增强的表示能力的语义对齐匹配。此外,SAM-DETR ++可以根据设计的语义对准匹配,以粗到5的方式有效地融合多尺度特征。广泛的实验表明,所提出的SAM-DETR ++实现了优越的收敛速度和竞争性检测准确性。此外,作为一种插件方法,SAM-DETR ++可以以更好的性能补充现有的DITR收敛解决方案,仅使用12个训练时代获得44.8%的AP和49.1%的AP,并使用Resnet-50上的CoCo Val2017上的50个训练时代获得50个训练时期。代码可在https://github.com/zhanggongjie/sam-detr上找到。
translated by 谷歌翻译